Minimize changes and fix commit logic.#2
Conversation
| return super.apply(); | ||
| if (base.currentSnapshot().snapshotId() == base.snapshot(targetSnapshotId).parentId()) { | ||
| // the snapshot to cherrypick is already based on the current state: fast-forward | ||
| validate(base); |
There was a problem hiding this comment.
how many times do you reckon validate(base) would be called per snapshot commit? I'm asking coz it's an expensive operations, validate is θ(2N) where N is # ancestors, which can grow monotonically.
There was a problem hiding this comment.
This PR only calls validate(base) once for each commit. So your PR doesn't add any further complexity to what we already had earlier in the original changes. Although, i'm just pondering on how this can impact commit performance more generally. (this doesn't have to be addressed in this pr)
There was a problem hiding this comment.
I don't think we need to be concerned about performance here. Tables usually have under a few thousand snapshots for us, and we have very frequent commits and a 7 day retention window. Even if we process a few thousand snapshots a couple times, it's not going to be much of a delay because these are all in memory.
Also, we have to process the snapshots for each new version because the set of valid snapshots may have changed. Every time the commit fails, this will retry and validate the latest table state, which is unavoidable. In the best case, that will happen just once, but it could be up to the number of retries configured for the table.
prodeezy
left a comment
There was a problem hiding this comment.
Had some concerns around average complexity based on how often validate(base) would be called but lgtm otherwise. Thanks for catching and addressing this!
rominparekh
left a comment
There was a problem hiding this comment.
@rdblue : I like the simplicity of the code change. Using TableMetadata from TableOperations was nice touch!
| updated = base.replaceCurrentSnapshot(newSnapshot); | ||
| } | ||
|
|
||
| if (updated == base) { |
This updates the cherry-pick addition with a few changes. I took a closer look at the commit logic to make sure it was correct and found that I had previously made some bad recommendations. That's why I thought it would be easier for me to fix it and open a PR.
One problem was that
SnapshotManagerwas keeping aTableMetadatainstance and aTableinstance and using those for validation. That's not a good practice because we should assume that the table metadata is changing.Tablewas passed in because I wanted to keep theSnapshotUtilAPI in terms ofTableand notTableMetadataorTableOperations, but I think it wasn't worth it because it confuses validation logic. Instead, I've moved the ancestor methods intoSnapshotManager. We can generalize this later if we decide to allowSnapshotUtilmethods to acceptTableMetadata.Another problem was that
applywas validating changes and then callingsuper.applyfor cherry-pick operations. Insuper.apply, the table metadata is refreshed, so it could change between validating the current state and applying the changes. To fix this, I addedapply(TableMetadata)that is called bySnapshotProducer.applyso that the validation is always done on the correct version.This also catches the case where a cherry-pick is actually a fast-forward and uses that commit instead.
Last, I made some minor modifications to
TableMetadatato minimize the overall changes.